SOCOflBHT BBSOSB 



BO 099 U25 



95 



fa ood 303 



AOTHOfi 



XNSTXTOTXON 

SFOHS AGENCY 

BBPOBT SO 
POB DATE 
C087BACT 
NOTB 

BDBS PRICE 
DBSCHIPTORS 



Iarsen# Edwin P. 

opening Znstitational ledgei: Books— A Challenge to 
Educational Leaderships Suggestions for Talking to 
school'^Coaattnit; Groups About Testing and Test 
Results. TM Report No. 28. 
SBIC Clearinghouse on Tests, iieasureaent, and 
Evaluation, Princeton, N.J. 

National Inst, of Education (BHfiff) , {lashington, 
B.C. 

ETS-TH-28 
Dec ?<♦ 

OEC-0-70-3797-519 
13p. 

HF-$0.75 aC-$1.50 PI4IS POSTAGE 
♦Educational Testing; ^Inforaation Dissenination; 
*Hanuals; School Districts s Schools; Scores; ♦Test 
Interpretation; ♦Test Results; Tests 



ABSTRACT 

Three key areas are outlined dealing vith the 
developaent o£ public understanding of testings fl) Bhy tests are 
adainistered in schools: needs assessaent, instructional prograa 
evaluation, aaterials selection, reporting to public, docuaenting 
individual growth, diagnostic analysis and planning, and 
instructional grouping. (2) Types of tests used, featuring 
explanations of achievenent tests. Scholastic Aptitude Tests, 
interest tests, specialized aptitude tests, and personality tests. 
(3) Interpretation of test noras, raw scores, grade eguivalent 
scores, percentile ranks and stanines, I. Q. scores, and suaaarizing 
results (aedians and guartilesf • Hethods used to chart test results 
of a school or district are discussed and suggestions aade for the 
basic tools needed, the need for niniaua use of nuabers, and the 
facility of percentile ranks. Tables and charts for presenting 
statistical inforaation are proposed, and suggestions include 
highlighting specific skills, conparing aptitude and achieveaent, and 
charting growth ftoa grade to grade. Finally, in discussing results 
and school accountability, the following are proposed s assuae 
leadership-^-an advocacy position in identifying discrepancies in 
pupil perforaance (needs) , relate results to instructional efforts, 
discuss resource needs of the district and school, outline 
noninstructional probleas the school and coaaunity aust address, and 
approxiaate accountability. (RC) 



<> 



EP*| I ERtC aCARtNCHOUSE ON TBTS, MEASUREMiNT, & EVALUATION 

^Tl I Mi EDUCATIONAL TESTING SERVICE, PRINCETON, NEW [ERSEY 08S40 



yjTM mem 28 



DECEMBER 1974 



OPENING INSimrnONAL UBDGER BOOKS-- 
A CHALLENGE TO EDUCATIONAL LEADERSHIP 

Suggesiions for Talking io School^Community Groups 
about Testing and TeMt RernUU 

Edwin P# Larsen 



U.S. OSPAflTMENTOF MSAtTH. 

CmjCATtOM 

THIS POCUMeNT MAS 6fEN »Cf»l«) 
DUCED aXACTlV AS RECEIVED f 490M 
THE PER&ON OH Of»GAMMTlONO(3<C<N 
ATtl^lT POlMTSOf VjeWOROI»fN}ONS 
STAT60 00 NOT NEC£S5A«JkY 
SBNTO^I^iCiAL NATiOttAU INSTITUTE OF 
EDUCATION POSITION OR POifCY 



INTRODUCTION 



"Today we expect of the school that the majority 
of students really learn something — a novel, indeed 
unprecedented, demand/*^ 

Businesses, particularly large ones, represent an 
idealized prototype of accountability to most of the 
American public, There is seldom a question of whether 
the stockholders and /or consumers have a right to infor- 
mation regarding the performance — or productivity— of 
a multimillion dollar enterprise. The popularity of this 
accountability concept is attested to by the wide support 
given to consumer advocacy groups and agencies. 

Education is also big business, both nationally and at 
a local level. Education is our single largest national 
expense, and, because of this, its performance matters a 
great deal. Economists note that formal education in 
schools and colleges accounts for 12 percent of our gross 
national product. In my own community (Oakland. Cali* 
fomia), the public schools rank sixth among employers. 
In addition, schools have a near corner on the consumer 
market with 90 percent of the pupils attending public 
schools. 

Regardless of the observations of Christopher Jencks,^ 
our economy is now dominated by the ''knowledge 
worker.** so aptly described by Drucker. We have long 
passed the time when success within the academic insti* 
tution matters little or not at all. when most people 
earned their livelihood as^ skilled craftsmen. Education 
will not guarantee success today, but without it success in 
our contemporary society will come with much more 
difficulty. 

^I^tcr F. Drucker, Rise of the Knowledge Worker, Oilcago, Encycio* 
p«dia Britannlca. 1971 

2christopher Jencks, et aL Inequality. New York, Bask Books. 1972. 



Uttle wonder then that both school stockholdei^ (tax- 
payers) and school consumers (parents and students) ask 
school personnel (administrators and teachers) for infor* 
mation regarding the performance of the business (the 
educational program). These questbns come in various 
forms; 

Mother of foftrtii smAsm 'i*ve heard that one-half of the 
children in our school read below grade level Is that 
true?" 

Father (businessman)! '*Isn't it true that thh new math 
program that we have in this school is a miserable fail* 
ure? My son can't even add and subtract accurately.'' 

Mother of SmAor high stud^ts "Friends of ours are 
moving to this area. They have a tenth grader and want 
to know where to purchase a new home. What is the b^ 
senior high in this area?'* 

Community action group spokesmans *M believe that this 
district has faU^ to provide equal opportunities and 
resources for alt youngsters. There are schools in this city 
where none of the children can read at all." 

These are questions from consumers who do not have the 
facts about the academic performance of the students in 
the schcK)ls. Unfortunately, there are many school staff 
members who either do not have the information or are 
uncomfortable in discussing it with theU- constituencies* 
Ail too frequently, the professional ^tablishment seeks 
to avoid this whole matter with the statement that "t^t 
data are not well understood and subject to misinter* 
pretation by an unskilled public." 

While there are many valid concerns regarding the 
potential misuse and misunderstanding of test scores* it 



Thin publication was prepared pursuant to a contract with the National Institute of Education, VS. Department of Healtfi, Education 
and Welfare, Contractors undertaking such projects under government s{H)nsorship arc encouraged to express freely their judgment in 
^ rofessional and technical matters. Pointe of view or opinions do not, therefore, f^present official National Institute of Education 
EKJ C wrftlon or policy. 



Is the writer's observatbn that test t^its are more ft«« 
quentty underused. I have heard student!? and parents 
express more concern about the nonusr and unavail* 
ability of test scores than about the misuses of ttsults 
(labeling, tracking, and so forthK 

Because certain legislative leaders in CaUfomia felt 
school professionals were keeping test scores from public 
view, they took over the leadership in this area, h boiled 
down to "if the school people will not report to the 
public, we will." For several years, the State Department 
of Education has been requh^d, under various state 
testing laws, to publish district-by-district and school-by- 
school results of state-requlred reading and math tests. 
There Is less and less a question of whether t(»t results 
will be disseminated to the public. The critical issue is 
"by whom?" 

Public release of school-by-school test data by local 
school districts is not a common practice. In a number of 
school districts known to the writer, school-l^-school 



avei^ges are not even available to school principals aiHl 
teachers. This was the ease in my district seven y^irs ago. 
In l%7, after two years of greater openness within the 
proii»sional ranks* the first publk school-by-schooi 
report was made to the Oakland Board of Edueation. 
More and more districts sae taking this step. They aw 
findhig that there are many bencAts to be realized in an 
atmosphere of openness and thct the public and 
are able to understand and use the results more wisely 
than was assumed pcssible. 

The purposes of this paper are twofold: to present 
some background concepts regarding testing, scores, and 
statistics and to make some suggestions for communi- 
cating the information to other professionals, parents, 
students, and the public. It is hoped that some of the 
concepts and suggestions will be usefol to school per- 
sonnel Intending to assume a leadership position in pro- 
moting better understanding and better us(» of tests and 
t^ scores. 



GROUP PRESENTATIONS OF TEST RESULTS 
A Minicoune in Te$u and Meamremem 



LESSON h Devetopmaitof PabUc 
Undentaisdlng of Tests and Testing 

Following is an outline of the three key areas that should 
be addressed in making a group presentation of test 
data-— whether it is to a professional, staff, or lay group. 
(Often one assumes that teachers or principals know 
much more about tests, statistics, and scores than they 
actually do.) The intent is to gWe the consumer an 
orientation or minicourse that will enable him to under- 
stand the results of a district or individual school. The 
depth into which one goes will vaty with the group in- 
volved. A full presentation can take as long as an hour. 
Briefer presentations, however, are possible and some- 
times necessary. Included in this report are several 
prototype figures, illustrations that could accompany a 
talk on testing. These figures aie relatively easy to 
compose and. with modem audiovisual techniques, turn 
into overhead transparencies or slides. A few pi«;es of 
colored tape or a few watercolor pen strokes can Ihren up 
the transparencies consid^bly. It is also pimible to cut 
out certain pages of this paper and put them through a 
transparency-making machine. Transparenci^ have 
proved useful in making a large number of public pre- 
sentations of test results during the past few years. The 
figures will be referred to in the text that follows. 

Background Area At Why Are Te»t$ Administered 
inSckooUi? There are a number of valid administrathre 
and instructional reasons for administering group tests. 



You are off to a poor start if the only reasons you ean give 
aj% that the state or district requU«s th .1. The oiriy 
justification for administering tests or for anything else 
we do In schools is helping children. This does not always 
mean an indbtdual pupil case study but may include 
using scores to assist in making a number of key 
planning decisions. 

Listed below and illustrate} in Figure 1 aie some of 
the most common reasons for testing children. You w«l 
be well advis^ to present only those that apply to your 
school situation; platitudinous statements are sensed 
very quickly as being superficial. Each of these items can 
be tied concretely to showing how the information helps 
administrators, teachers, and parents to help children. 
Educational planning is an enormously complex task. 
Test scores provide only one type of input into the 
process. A test score, by itself, should never be the sole 
basis of A decision. However, test scores ean contribute 
very useful information in the decision-making process. 



1. Needs Assessment. Test scores are useful, objective 
indicators of pupil skill levels at a given point in time. 
Tfric discrepancy between desired levels of pupil per- 
ff rmance (group or individual) and actual peifcMrmance 
f onstitutes an educational n^ or priority. When group 
or individual performance is at or above a level judged to 
be satisfactory, one has useful confirmation that the 
educational program, as it is fonctioning, is satisfactory. 
On the other hand, when group or individual per- 



FJflure 1: Why Am Tests Adminlnered? 



GROUPmO 

INSTRUCTION NEEDS ASSESSMENT 




REPORTING TO 
THE PUBLIC 



formance i$ below acceptable levelSt one has a clear 
charge to make changes — more time, diiSerent materials^ 
and/or different teaching strategies. 

Test information can be useful to ptx)fessionals and 
laymen in making important decisions regarding their 
plans for helping children. Examples can be given in 
almost any school or district where discrepancies have 
been found and program changes have been instituted or 
are planned* 

tH*of^$ional$ and laymen sometimc^s express the 
opinion that test scores label children. This is a super- 
ficial observation since it is the professional or layman 
who does the labeling. Low performance shoukl be dis* 
cussed not hidden. It is only when the performance prob* 
lems are discussed and appropriate actions taken that we 



can expect children to benefit or their performance to 
improve. 

2. Evaluation qf Instructional Programs, Educaton; are 
constantly searching for improved, more effective ways to 
help children. Pre* and posttest scores often provide 
critical Information on the effectiveness of experimental 
and innovative programs* In many Instances, scliools are 
faced with finding programs that will be the most cost 
effective. With limited resources schools must find 
methods for maximizing the amount of learning result'* 
ing from each teaching hour. Federal and state agencies 
are demanding this type of accountability. Test scores 
can provide one type of information on the effecti«tfjess 
of certain new programs. Examples such as the new 



3 



ERLC 



math programs *»r trxperimtfiiis in multiage grouping can 
be given in nuwt ichtml districts. 

.1 St'Urtmt a/ instrui'thml Muhriul. Principals and 
teachers face the need to purchase Instructional 
materials at diHiculty levels appropriate for the pupils in 
their schiwls. Accelerated pupils will gain little from 
materials geared below their performance levels. Like- 
wise, pupils performing at average and belo^.average 
levels need materials that are most appropriaie io their 
curnrnt instructional levels. By examining the numbers 
of pupils achieving at various performance levels, ad- 
'ministrators and teachers can make informed decisions 
n?garding the purchase of supplies, textbooks, work- 
bmtks. and other materials best suited to the children. 

4. Htpornnfi to tlw Public iConsumvn). Boards of 
education and citizens at large, as well as legisla- 
tures, continue to express interest in obtaining peritKlIc 
re{Hirts of pupil performance. Many educationisil leaders 
have a keen interest in involving and informing the com- 
munities they sene. The purpose may be to build confi- 
dence in effective existing programs or to enlist support 
for needed improvements. Recent reports to boards of 
education as well as the very presentation being made to 
the group involved can be given as examples. 

In some communities the increasing involvement of 
parents in planning and decision making makes it 
imperative that they be well Informed. Parent recom- 
mendations founded on hearsay information are hardly 
w>und bases for actions that will help children. 

5. Dm unmiting ihf Growth of Individual Pupils, Most 
parents and teachers have a great deal of information 
a-gurding the performance and growth of individual 
pupils. Many students are alsti aware of their status, 
owing to many feedback mechanisms <e.g.. grades and 
their own observations). However, there are important 
instances where nonjudgmental (objective) information is 
needed by parents, students, and teachers. 

Following are examples of questions that may be 
answered much more easily if test scores are available for 
tndividuul pupils: 

• "H wv move to another part of the country, how 
will John fit into their program?" 

• "I know Mary has gtK>d grades, but is she really 
college material?" 

• "Wc tcci your standards are lower than those of 
othtT <.chot)ls. just how well is Sue reading? is she 
really at grade level?" 

A Now (if Cautinn. Group tests are not precision 
instruments for individuals: group averages are much 
more accurate than scores for one pupil. Therefore, indi- 
vidual sci»res sht)uld be viewed as reflecting approximate. 



or general, levels of performance. Some scores may be a 
tittle higher, others a little lower, than the student's true 
capacity. 

There should be pnwisions at all schools for parents to 
have access to the scores of their children. Furthermore, 
there should be oppt^rtunltics for pupils to receive feed- 
back on performance. Methods may vary from school to 
school— -it may be a teacher, counselor, or psychologist—* 
but biith parents and pupils have a right to this type of 
information, accompanied by wise Interpretation and 
counsel. Some questions that are raised In an audience 
situation should be discussed In an indlvklual 
ference. Your policies regarding interpi«tatlon of iiKli- 
vidual students* test scores should be explained. 

6. Diagnostic Analysis and Planni/tg. Most group 
achievement tests have been cai«fully designed to assess 
each of the most imponant skill areas with a sampling of 
only a lew questions. There are not enough questions In 
each area to fully diagnose the strengths and weaknesses 
of individual children, but the tabulations of responses of 
a class or school group can often give diagnostic leads~> 
' e . .he skill areas that have and have not been mastered. 
I ' > »i.ie cases, it may mean reteaching materials already 
co\i Tcti. In others, it may reveal key areas that have been 
ovr ,»ked. 

7 grouping for Instructimi, Group achievement test 
scores can be useful in general planning for instructional 
groupings and in conjunction with the use of scores to 
select appropriate levels of textbooks. By examining the 
numbers of pupils performing in various ra^i^es. one can 
develop plans for providing differentiated programs 
geared to the ne^s of the pupils. 

Another Note of Caution. Group t'Wt scores should 
never be used as criteria for placing students into one 
group or another. It is well known that individual scores 
are subject to fluctuation (error). The pr&ctlce of drawing 
a line, or cutting score, is an abrogation of professional 
responsibility. On more than one occasion. I have heard 
professionals explain to parents that it was not they who 
made a placement decision: it was a test score! An indi- 
vidual's score may be used for counseling or guidance; 
discussing how students who have prevbusly entered a 
given algebra course with similar levels of entiy skills 
(scores) have fared may help In making a decision. Or the 
score may be combined with other information (such as 
grades) and the preferences of the student in arriving at a 
pupil-orientcd decision. 

«. Other Uses. There are probably a number of other 
pupil-oriented uses made of test scores in your district. 
The foregoing represent only a few of the major areas 
that might be used to explain why tejits are administered 
to pupils in the schools. 



Background Area B: tt hnl KindM of Tests Are V§ed? 
There Is of ten an utin^msary air of myiitery stirroyndifig 
the tests and test scores. Unfortunately, some would like 
to keep it that way. It is very important to dkpel any 
secretive aura that may exist by giving a few facts about 
the instruments that have been used. On the following 
pages are some appraaehes, or ccHicepts, that may be 
used to explain the nature of the types of tests used most 
frequently in the public schools. These are shown in 
Figure 2. 

|p Achm*emi*nt Tests. These tests are made up of a 
sampling of questions or problems on the typ^ of 
materials coveitd and concepts taught in the curricu-^ 
lums common in schools throughout the country* They 
are not intended to cover only the speclHc curriculum of 
a given school or school dt^ct« Rather, the test Is In* 
tended to assess how well a youngster can apprcmch 
certain problem situations by applying the skills and 
concepts he has learned thus far in his academic career. 

One should give specific example of th^e questions 
or problems. Items that are similar to those us^ in the 
tests* In my district, we have used orientation booklets 
that incorporate 3 to 6 sample questions from each 
subtest. These booklets were supplied by the publishers 
ot^the tests. There Is seldom any puzzlement or objection 
to test content when parents or students see concrete 



examples cif it. (Copies can be supplied for the CTBS and 
ITHD upon r^uest to the author.) 

O^'erhead transparencies can be used if printed 
examples at^ not feasible* In some instances, a verbal 
explanation may be the best approach: "In a readhig 
test, children are asked to read short paragraphs shniiar 
to those found In textbcK^s and to answw questions to 
show how well they have understood what has been read. 
Arithmetic or mathenwtlcs tests contain addition, sub- 
traction, multiplication, and dhrision problems. Also, 
they often contain questions to show how well students 
can solve word pi^blems. A waiting tm usually Instructs 
pupils to determine whether or not a sentence has proper 
punctuation, capital lettet^t and spelling. Pupils are also 
asked to sel^t senten^ that express an idea most 
cleariy/' 

2» Scholusiic Aptitude Tests, This Is the second type of 
test most often administers^ In the schools. Hiese tests 
are sometimes loosely referr^ to as intelligence or LQ. 
tests. The test makers themselves prefer to consider th^ 
tests as measuring school aptitudes. The key assumption 
here is that a pupiPs level of Intellectual functioning, 
particularly in verbal and numerical skill ane^s, will give 
an Indication, or prediction, of the rate of progress or 
success a pupil will have in future numerically and 
verbally oriented studies. The relationship between past 



Flj^2: TvptsofTem 




EXAMPLES 



READING 

ARITHMETIC 

tANOUAQE 



y SCHOOL ABILITV 
RSAOiNESS 

i,a 



VOCATlONAt INV€NTORi£S 
PREFERENCE TESTS 



CLERIGAt 
MECHANICAL 
SPACE RELATIONS 
MATHEMATK^ 



CONTENTS 



TYPICAL PROBLEMS 
FROM BASIC SKILL 
AREAS, MEASURE 
ABILITY TO TRANSFER 
OR APPLY SKILLS 
LEARNED IN SCHOOLS 



VARIETY OF VERBAL 
REASONING AND NUMERICAL 
REASONING PROBLEMS; 
MEASURE ABILITIES 
REOUIRED IN MOST 
SCHOOL COURSES 



LISTS 0)^ ACTIVITIES 
RELATED TO A VARIETY 
OF OCCUPATIONS; 
STUDENTS INDICATE PREF^ 
£RENCES-eO.. OUTDOOR, 
PERSUASIVE, SCIENTIFIC, 
AND SO FORTH 



REASONING ANDSKfLL 
PROBLEMS IN SPECIFIC 
SiCtLL AREAS JUDGED 
TO BE CRmCAL TO 
SUCCESS IN VARIOLA 
OCCUPATIONS 



KINDS 

OF 
SCORES 



GRADE EQUIVALENTS 
PERCfNTlLES 
(BASED ON NATIONAL 
SAMPLES OF STUDENTS) 



PERCENTILES 
LOf 

IBASED ON NATIONAL 
SAMPLES OF STUDENTS! 



PERCENTILES OR 
OTHER RANKS (BASED ON 
SAMPLES oiF STUDENTS 
OR SAMPLES OF PERSONS 
IN OCCUPATIONS! 



PERCENTILES 
IBASEDON SAMPLES 
OF STUDENTS^ 



5 



ERIC 



0 



achievement and future achievement has been docu* 
mented by hundreds of research studies. 

Scholastic aptitude tests are more general in nature 
than achievement tests, but they must not be comidei^d 
as direct measures of innate intelligence in the generic 
sense in which intelligence is generally dl^ussed. 

The home, school, and an individual's biological 
inheritance will have combing to affect how much 
verbal and numerical proficiency he has achieved and. 
hence, is able to demonstrate on a scholastic aptitude 
test Intelligence is not directly measurable, nor are any 
other cognitive functions. Scholastic aptitude is merely 
judged by current achievement and/or skill levels. 

Scholastic aptitude scores require very ciu^ful inter, 
pretation. A statement that a given group of pupils is 
performing "as well as can be expected," while accurate 
in a certain sense, is subject to much misinterpretation. 
Pupils may be performing on reading or arithmetic tests 
as well as can be expected, given all the antecedent 
conditions that contributed to their present verbal and ^ 
numerical skill levels. This does not say that "this is all 
the pupils are capable of." 

There is, at best, a vague distinction between achieve* 
ment and scholastic aptitude tests. Virtually identkral 
questions may arise concerning both. 

In this writer's view, very cautious use should be made 
of scholastic aptitude or l.Q, scores. Given a certain level 
of general verbal and numerical abilities U scholastic 
aptitude score), is a student able to apply his skills to 
problem situations ^achievement test items) as well as 
other pupils who have approximately the same verbal 
and numerical aptitude scores? If a person does not feel 
comfortable In dealing with the interpretation of 
scholastic aptitude scores, he would be advised to omit 
them and deal with achievement scores only. 

It is suggested that samples of scholastic aptitude tests 
be shown to an audience so that they will fully under- 
stand the bases of these scores. 

3. Interest Tests. Some school districts administer 
vo'.'ational interest tests as part of programs designed to 
promote vocational readiness and awareness of career 
options. Generally, students are asked to express pref- 
erences for various kinfls of activities and experiences. 
The vocatjonal interest inventory provides scores that 
summarize interests into types {persuasive, social wel- 
fare, artistic, and the like) or show how a person's 
responses co?npare to those of persons who are successful 
and satisfied in various occupational categories. These 
tests are not intended to tell a student what vocation to 
pursue. Rather, they yield some facts that can be used 
along with student achievement, family values, and so 
forth, in making vocational choices. 

4. Spvcialised Aptitudv Tests. Some tests are designed to 
measure very specialized aptitudes, such as mechanical. 



clerical, spatial relationship, mathematical, and abstract 
reasoning skills, A report on how well a student has 
achieved in these areas may also help him or her make 
decisions about entering fields such as engineering or 
business, 

S. PmoBa% Tests. Instruments that attempt to 
measure a person's Innermost psychological makeup and 
attitudes are seldom administered In public schools. 
Parental permission is usually required, and such tests 
are administered only by trained psychologists. In many 
states, such testing is controlled by law, but some parents 
wonder about such "invasions of privacy." and it may be 
appropriate to explain local practices concerning thb 
area of testing. 



Background Area Ci How Do I Interpret Different 
Kind§ of Seore§ and StatitUcB? Some basic definitions 
are in order before moving on to an examination of score 
averages for a school and/or a district. 

1. Tm Norms (Making Comparisons). The most com- 
mon types of standardized test scores are those that 
compare a pupil's performance with the performance of 
children in a national norms group. The publisher of a 
tist selects a sample of children from ail ports of the 
nation in order to estimate the average or typical level of 
performance of children at a given grade level. This 
sampling procedure is similar to the one used in opinion 
polls. The information obtained from this sample gt«up 
becomes the publisher's norms, and we can determine 
how a child or group of children compares with this 
sample group. 

In California at certain grade levels, a pupil's scores 
can also be compared with scores of other children in the 
same grade throughout the state. The state score pat- 
terns and averages based on testing all children in the 
state at selected grade levels are used as state norms. We 
can tell, for example, how a sixth grader ranks in reading 
ability with other sixth graders throughout the state. 

The score averages of the city provide yet another 
comparison in evaluating local school results. The 
question here is "How well does a pupil or school rank 
within the city?" 

One can see that it is Important to keep to mind the 
group with which a pupil's peribrmance is being com- 
pared. 

2. A RoH' Score {Number Correct) Does Not Mean 
Much. Two types of scores are generally used to evaluate 
test performance. It is probably better to select one or the 
other for purposes of presentation. Some ideas In the 
following section may be usefiil in explaining these 
derived or converted scores to clients. 



ERIC 



Crac/i* Kqui%*uliml Scvm, Grade equivatent scores, 
iiometimes caili^ grade piaeeme»t seores« are often 
used because the number of ccrreet answers on a ^st 
has Httte meaning in itself. A score of 20 on one test 
may show first grade performance^ On another test the 
sume score may show high school level i^UI 
developments Much depends on the test's level of 
diHkulty. In order to develop grade equivalent scoi^s» 
publishers administer the tests to children in different 
grade levels during a given school month. 

The average number correct* or tm* score* eam^ 
by first graders in the ninth momh of school is 
assigned a grade equivalent value of t.9. ^e average 
raw score for third graders in the ninth n would 
he assigned a grade equivalent value of We can, 
therefore, compare a pupil's raw score whh the scor^ 
of other children. One will note in Figui^ 3 scores such 
as U.S.t). and so on. The number 1.3 infers to the 
score earned by an average first grader b\ the third 
month of the schivol year, and S.6 means i* score levd 
typical at the sixth month of grade 5. To answer the 
question •'What is an average scor^?", one must indi- 
cate the grade and month of the school year in which 
the test was administered. 



Ftfjurt 3. tnti^pfititi«i of Omti Eqttiviiim Scorn 





QRAOS 






EOUIVAiCNT 


iNT^nmtrrATtON 


20 


43 




19 


4.1 




18 




^rnmo MAOBn m ninth month 


17 


37 


<0F GRAOS 3. 


16 


as 




IS 


3.3 




14 


3.1 


(^ORC EARNED 8V AVERAOE 


13 




>miCOND GRADER iN NINTH MONTH OF 


12 


2.; 


{GRADE 2. 


11 


2.S 




10 


23 




9 


2.1 


(SCORE EARNED SV AVERAGE 


8 




^l^tRST GRADER rN NINTH MONTH OF 


7 


f J 


(GRADE 1. 


6 


r7 




S 


1.6 




4 


1.6 




3 


14 




2 


13 






1.2 




0 


1.1 





Pi'rcvfttile Ranks and Stanmes. Percentile ranks are 
also useful in interpreting test performance* Percentile 
rank is not the same as percent correct, which is often 
seen on teacher-made tests such as in spelling or arith- 
metic* A percentile rank indicates the percentage of 
students within a reference or norms group whose 
scores fell at or below a ghren score. 

Students differ widely in their physical and mental 
performance. If we were to take 100 randomly selected 
youngsters from ail parts of the nation and ask them to 
run a race, we would soon ifind differences in running 



skills. The same Is* of course, true with reading* arith* 
metiCt and other skill at^sk 

The SOih percentile is considered to be "right on the 
grade lever* or ri^t at the average of the norms 
group. A pupil who is at the 50th percentile in grade 5 
b at the average grade level for fiflh graders. Similarly, 
a pupil with a 50th percentile rank in grade 8 is at the 
average grade level (or eighth graders Imt is achieving at ' 
a higher level than the average fifth grade youngster. 
In evaluating the tf^ performance of an individual 
studentt one must keep in mind that a gWen score b only 
one sample of the student's ability to perfbnn in n given 
subject area. An individual student*s scores wmild 
generally vaiy considerably If he wei^ tc^t^ several times 
with tests of equal difficulty levels and covering essen- 
tially the same skill areas* Variations in scor^ are 
generally due to variations: (1) hi the way a student 
approaches the test (his motivation or his physical condi^ 
tion at the time of toting); (2) in the way the test was 
administered (how well dh^ions were explained or the 
general conditions under which the student was work* 
ing)i and (3) In the t^ts themselv^ (tests cannot include 
items measuring every skill taught or learned in a given 
area: rather, each test contains what is hoped to be a 
balanced sampling of items. One test may sample more 
of what the student knows than another test may 
samplel 

Therefore, a percertile rank between 25 and 7$ is 
usually considered to be within an average range« Varia* 
tions of several points in individual student scores shoukl 
not be misinterpreted. 

Figure 4 may be used to give a graphic illustration of 
how a typical group of pupils would score on a 20*item 
reading test. The I(X) students who are illustrate repre* 
sent the distribution or spread of scores within a hypo- 
thetical norms group. Actually, publishers utilize the 
scores of several thousand youngsters in deriving per- 
centile ranks. 

One will note that the percentile rank range has been 
divided into nine levels called stanines. In evaluating 
differences in percentile ranks falling within a given sta- 
nine or stanines. a good rule of thumb is to discount 
differences of one stanine level or less, Differenci^ of 
more than one stanine level are probably significant 
variations in performance. The following example may 
serve to illustrate this point: 

John r^eh^ed a perrantile rank of 28 on the reading 
test, which is within the stanine 4 level and a per- 
centile rank of 53 on the math test* which is within 
the stanine 5 level. His performance on the reading 
and math tests should be considered to be compara* 
ble. We have little proof that score differences of this 
.(nagnitude reflet true differences in skill levels; th^r 
(may have resulted from the sources of eitor noted 
f^viously. 

7 




Mary received a percentile rank of 8 on the math test 
(stanine 2 level) and a percentile rank of 26 on the 
reading test (stanine 4 level). Mary Is probably slg- 
nificantly more skiliftit In reading than in math since 
her respective percentile ranks differed by more than 
one stanine level. 

• l.Q. Scores. For many years the term I.Q. has been a 
common word in our sochrty. Unfortunately, the 
meaning of this term is not s| understood. An l.Q. 
score is just another type ol score used to Indicate 
where a person's performance ranks in comparison 
with the performance of individuals in the publisher's 
norms groups. 

I.Q. tests are not intended to be measures of native 
intellectual ability, as is sometimes assumed. Rather, 
the ones generally administered in schools sample 
student skills in verbal and numerical reasoning an»as. 
A better example of the group "I.O." tests used In 
schools is probably a scholastic aptitude test. 

An I.Q. score of 100 is used by test makers to repre- 
sent the score level of an average person in a given age 
group. Approximately 68 percent of 4 given age group 
receive I.Q. scores between 84 and 116 on most 
modem scholastic aptitude tests. The I.Q. score 
system Is based on statistical assumptions that are not 
commonly understood. Therefore, It Is probably easier 
to grasp where a person ranks on a scholastic aptitude 



or l.Q. test when scores are expn»»ed as percentile 
ranks rather than as I.Q.s. 

3. Summarising Results {Medians and QuanHes), As we 
have seen, there is almost always quite a spread in pupil 
scores. Some pupils may get nearly oil questions correct, 
others very few. The distribution of scores on a reading 
test, as shown on foltowing page, will illustrate this fact. 
To summarize this array of scores, median and quartile 
statistics are usually used. 

• 01. or the first quartile, indicates the point at or below 
which one quarter or 25 percent of the pupils fell. 
Three-fourths of the pupils scored higher. 

• The median, or second quartile, indicates the mid- 
point in the score range. One half of the studems are 
above and one half below this point. This is important 
to keep in mind. A median obviously does not reflect 
the scores of all children. 

• 03, or the third quartile. Indicates the point at or 
below which three quarters or 75 percent of the 
children have scored. One-fourth have scored higher. 

Figure 5 can be used to show a sample tally of scores 
for a group of % pupils. One can also show the quartiles 
and the publisher's grade equivalents and percentile 
ranks for each quartile. 



8 



ERIC 




Q3~( ^ Of 7i% 
were ai or below 
a score ot il 
or G£ 4 4) 



Meoian or 02— c/j or 
50% wereaooveof 
seiowascoreot It , 
Of K 4 Of 

01 -('/.Of 25% 
were a) or Wm 
a score of 8, 
or GE3 2) 



LESSON II; Charting Your Resultt 

The specific method used to chart, or present, the mt 
results of a school or district will depend on the points 
one wishes to highlight. The illustrative charts described 
in this section show only some of the comparisons or 
analyses that can be made. In most instances, the 
audience is assumed to be an individual school'Com- 
munity group. 

Baekftround ComideratioMt Preidanning. There 
are a number of things to consider in preplanning: 

1. Bmk TtKth. Vou will need a frequency distribution of 
the district and/or school results you intend to present. If 
one is not available through a district or commercial 
scoring service, a hand tally will need to be done since 
virtually any summary presentation focuses on the distri- 
bution of scores. Vou will also ne«l the quartile points 
(Ql. 02"^ median, and Qi). Raw scores are the best start- 
ing ptiint for summarixtog the results for any test, but to 
become meaningful they must be converted to per* 
centiies. grade equivalents, or another norm-referenced 
score system. 

The other basic item you will need is a test manual 
and or handbook that contains the norms tables for the 
test for the time of year at which the tests were ad- 
minister^. 

2. Minimize the Use of Numbers. graphic repre- 
sentations of percentile ranks, grade equivalents, or \.Q. 
score ranges whenever possible. Converted scores are 



abstractions, and graphic presentations generally give 
most of the information staff and parents want. 

Quartiles are cttmmonly used and easily understood 
summary statistics. Bar graphs shoeing the ran^ 
betttx'cn the first and third quartiles as well as the 
median are simple and direct methtids of summarizing 
results. The use of medians without the pit^entation of 
first* and third-quartile data tends to give a very narrow 
presentation of the data. \x \% critical that pai«nts and 
staff recogniise the variabtiity in sitident performance not 
only at the local level but also within the norms gitnip. By 
focusing tm the performance of the middle SO peivent of 
the populatbn, one is aWe to moderate the "everyone 
should be above grade level" concept that dominates the 
thinking of many hidh^duaK Everyone in a group jujrt 
canmit be above the middle level performance of the 
group. 

Several of the sample charts discussed in the next 
section utilise simple bars to show the interquartile 
range. This is the range between 01 and Q%. The 
median, or 02. is shoun by a horizontal line across the 
bar. 

3. f*ercentUe Ranks Are Easier. To chart results for any 
test at any pvuie level or for multiple grade levels, a 
standard chart format can be used. Furthermore, the 
norms group quartiles are always located at constant ref- 
erence points <0i ^ 2Sth percentile; 02 « SOth pereen- 
tile: 0.1 ~ 7Sth percentile). If an audience can grasp this 
standard frame of reference, it will be much easier to 
present data fivr different tests and grade levels. In addi- 
tion, percentile ranks are a valid base for comparison of 
results for difi'erent tests or subtests. 

Grade equivalents may be more appropriate for your 
purposes but the development of bar graphs is much 
more difficult because of the broad range of scores (e.g.. 
a first grade test may yield scores ranging from 1 .0 to 5.0; 
a junior high school level test may yiekl scores fh»ro 2.0 to 
13.0). 

Sample Table* and Chttrt$» The following tables and 
charts should give some idea of ways to present statistical 
information. 

1. Results for a Singlf Test-Comparing Norms Croup, 
District, and School Results. Figure 6 shows a sample of 
a combined statistical (numeric) and graphic presenta- 
tion. It will be noted that results are shown for a single 
test at a single grade level. Since percentile ranks are 
used, no elaborate scaling is needed for the chart. The 
key here is the approximate relationship of the local 
interquartile range to the norms group. Only enough 
numbers are given to define the quartiles. One can at 
a glance tJiaf . while the district median is two percentile 
points below the SOth percentile rank of the norms 
group, that is not the performance level of the entire 



ERIC 













78 


n 






so 


m 








;» 


20 



OROUP 



90 



2S 






NORMS 
QRQUP 


OI8TRI0T 


SCHOOi 




U 






(MEOIAM) 




8J 






u 


3J 


24 



47 



54 



ad 



84 



t.O 



NOAMS 
GROUP 



ot$TRicr 




SCNOOi 



school population. In fact, nearly one-fourth of the dis- 
trict population is scoring in the top quartiie range of the 
normsgrmip. 

The same hypothetical data ore presented in grade 
equivalent form in Figure 7. To determine the grade 
equivalent valu« for the quartiles {Ql, 02, and 03) of 
the norms group, use the publisher's test manual. Rnd 
the raw score equivalent for the 2Sth, SOth, and 7Sth per- 
centiles, respectively. Then convert these raw score 
values into grade equivalents. These grade equivalent 
values will determine the upper and lower limits of the 
norms group bar and the median. 

2. Highlighting Specific Skills^omparing Per- 
formance on Subtests. Figure 8 illustrates a data pre- 
sentation intended to analyze the comparative per- 
formance of students in different skill areas. In this 
sample, subtests from an arithmetic test are used. 
Results of subtests such as reading, spelling, and 
language usage could also be compared in this manner. 

A school staff or board of ^ucation or parent group 
could see quickly the relative strengths and weaknesses 



of the groups tested. For example, students in both the 
district and schmtl performed slightly higher than the 
norms group in Concepts, somewhat below in Computa- 
tion, and slightly below in Applications skills. The 
overall performance of the district was only slightly 
below that of the norms group: the school's performance 
approximated that of the norms group at median arjd 
third quartiie points but showed proportionately more 
youngsters in the low^t pereentile rank levels than the 
norms group. These data might be of particular interest 
if a new mathematics curriculum were under study. They 
also might help a mother understand her child's indi- 
vidual test profile. 

3. Comparing Aptitude and Achievement. In Figure 9, 
select^ percentile rank ranges, called stanines, have 
been used to present the comparative performance of the 
publisher's norms, groups and that of a school's student 
population on an aptitude test and an achievement 
battery. By definition there are set percentages of 
students scoring within each stanine level in the pub* 
Usher's norms groups. Some districts utilize stanines as 



ERIC 



PtSurdS: 8ut»kiil Scores 




DISTRICT - 



5^ 



SCHOOL 



the basic standard score for reporting Individual and/or 
group scores, so this might be an appropriate format in 
some instances. 
Key points that can be illustrated by Figure 9 are: 

• Students show varying levels of scholastic aptitude and 
achievement in the publisher's norms population as 
well as at a given school. 

• Even though average or median performance may be 
above, at. or below that of the norms group, there are 
diverse student needs in the district. 

• Students' skill levels (achievement scores) are slightly 
lower than general scholastic aptitude scores would 
suggest they are currently capable of. 

• It is relatively simple to plot the distribution of this 
school's performance in curve form during a pr^enta« 
tion using a trprsparency marker. The curve would 
show graphically the levels at which the school's score 
patterns differ from those of the norms population. 

At the sample school, m^ian performance for alt tests 
would be In the fourth stanine range. For each test pre* 



sented we reach 50 percent of the schcM9rs population 
after we get out of stanine 3 and before we get to stanine 
5. For example, consider SCAT. We add the 8. ! 7. and 1 5 
percent in stanhies 1. 2, and 3 respectweiy. aiul we have 
40 percent. Since 31 petvent of the school's population 
falls in stanine 4. we reach the m«lian somewhere in this 
stanine range. However, there are youngsters whose 
achievement levels vary substantially from the median 
level. A small proportion af performing at substantially 
above average levels. For example. 9 percent of this 
school's population are in stanines 7. 8. and 9 on STEP 
Reading. A larger proportion of the school's youngsters 
are performing at below average levels: On STEP Math, 
for instance. 4b percent of the youngsters are in stanines 
1 . 2. and 3. 

4. Charting Growth From Grade to Grade Worms 
Group and District), There are occasions when it is 
desirable to show data on student progrc^ from grade to 
grade. Unfortunately, few school districts have the 
resourm necessary to compile student-by-student data 
over many years. 



// 



ERIC 



BANOg 



AWVEAV^I^AQE 
RANOt 




Q]QQ\QQ\QQ\q 
QQ\QQ\QQ\QQ\Q 

)i©0ioo;00i0oio oil 

OOi0O O0j00jO0IOOl© 

0I00|00 0 0J©0]00j0 0 j©0 it 
r- 000iQ0|0Q |0QiQQiQQ;QQido!QQ 

1 8TAMINE iEV6LS i 1 j ; { 3 { 4 6 j 6 , y | 8 » 9 




PgRCgNTAGgS OF PUBHSHSyS N Oftft^ W3PUiATI0ft» SCOfttflO AT EACH tCVgt f AUt TESTS> 



17 



20 



T 



17 



12 



SCAT 

8Tgl> MATH 
ST£P RSAOmO 
STgPWRmMG 



PgRCENTAOgS OP SCHOOL'S PQPUiATION SCORtMO AT gACH USVfit 




Crtm-seciional data for various grade levels tested in 
tlic ?iame year or cross*sectiona! data for a given grade 
cohort tested year after year can be used to approximate 
such growth studies. As a matter of fact* these types of 
data are probably as valid as any, since the norms data 
U>r developing grade equivalents, percentiles, and so 
forth are cross-sectionally derived. 

Figure 10 suggests a method for charting the inter- 
quartile ranges of data described above. Such a pre- 
sentation may be useful in showing a variety of trends, 
such as: 

# grades during which students are or are not show* 
ing desired amounts of growth; 

# students who may be bcknv grade level are. 
nevertheless, making substantiiil growth gains. 



LESSON Ilh Dheussing Yoitr Rmuh$7 
How Aeeonntiible Are Sehoob for Test Scores 

How accountable the schools are is obviously an un- 
answerable question— at least in simple terms. However, 



the following suggestions are presented for those who 
may find themselves in the accountable spotlight. 

1. Do Sot Explain the Test Scores Away. If you empha- 
size that the tests were not administered properly or that 
you are convinced that they are totally irrelevant, you will 
immediately conlirm the suspicion of many consumers. 

2. Assume Lendership-^Att Advocacy Position in 
Identityittg Discrepancies in Pupil Performance {Needsl 
This is not to say that one can promise instant remedia* 
tion of performance problems. However, I have seldom 
seen a person attacked by a consumer group if he is 
showing an intention to act in response to these needs. 

3. Relate Results to Instructional Ejforts. The results 
mean little in isolation. Even though strong nonschool 
factors may be prominent inlluences, one can hardly take 
the fKisiiion that the school program is not responsible, 
at least in part, for student performance. Known pro- 
gram strengths and weaknesses, highlighted by the 
results, should be discussed. 



Figure 10. Grade- to-Grade Growth 




GRADE 1 



GRADE 2 GRADE 3 



GRADE 6 



GRADE 9 GRADE 12 



NORMS 
AVERAGE 



DISTRICT 

INTERQUARTILE 

RANGE 



4. Discuss Resource Needs of the District and/or SchooL 
Needs for supplies, facilities, and other fiscally con- 
strained items should be noted. On the other hand, one 
may point to benefits derived from resources made avail- 
able in the past. 

5. Outline Noninstructional Problems the School and 
Community Must Address. Some of these might Include: 

• Absenteeism— including excused and unexcused 
absences. These often constitute significant problems 
in a number of schools 

• Phy?ilcal well-being of pupils— including problems 
associated with nutrition, adequacy of clothing, and 
treatment of majuf medical problems 

• Environmental complement to school instruction— in- 
cluding the paramount importance of parent interest 
in the child's school activities, encouragement of 



reading at home, importance of discussing with and 
explaining to children (building language competence) 

• Pupil interest and motivation 

• Pupil mobility 

6. Approximate Accountability. Describe the school's 
efforts in attempting to meet some of the problems listed 
above. Communicating the feeling that the schools are 
also recognizing problems related to instruction (e.g., 
"This school needs to strengthen its arithmetic computa- 
tion skills.") as well as problems attributed to the popu- 
iation they serve is essential to building confidence in the 
schools. Judiciously acknowledging weaknesses where 
they exist and indicating that efforts are being made to 
correct them are the strongest positions that can be 
taken. This may be the closest the schools can come to 
being accountable. I believe that it is about all that our 
consumers expect of us. 



13 



ERIC 



